attritevis Package: An R VignetteAttrition, the loss of study units from a sample, can often occur throughout an experimental study and at times pose a threat to inference. There are several studies, and accompanying R packages, that provide ex-post solutions to missingness such as double-sampling or extreme bounds. We provide a visually based guidance to assessing the types of missingness a study may have with a particular eye towards experimental and design adjustments a researcher can make after piloting a study.
attrition()attrited – how many respondents attrited at each question, proportion – calculated as number of attrited respondents / number of respondents entering into the question, proportion_q – calculated as number of attrited respondents / number of respondents entering into the question.data - a data.frame where variables are ordered by survey questions, such that earlier survey questions appear in smaller valued columns.attrition_table()attrition, but converts it into a table. Allows to subset table by treatment and control groups.data - a data.frame where variables are ordered by survey questions, such that earlier survey questions appear in smaller valued columns.condition - a string character that corresponds to treatment or control condition.treatment_var - a string character that corresponds to the name of the treatment variable.plot_attrition()data - a data.frame where variables are ordered by survey questions, such that earlier survey questions appear in smaller valued columns.freq - a boolean that sets Y axis of the attrition plot. Default is freq=TRUE, which is the frequency of attrited respondents. When freq=FALSE Y axis is the proportion of total N attrited, calculated as number of attrited respondents / number of respondents entering into the question.treatment - a string of name(s) of question(s) in which treatments were administered. Marked in the resulting plot with a red vertical line.pre_treatment - a string of name(s) of pre-treatment question(s). Marked in the plot with a green vertical line.DV - a string of name(s) of outcome question(s). Marked in the plot with a blue vertical line.other_group_var - a string of name(s) of question(s), corresponds to other_group category, specified by users. Marked in the plot with a purple vertical line. Note that both other_group and other_group_var must be specified to use one of the functions.other_group - a string of the name of the group of variables specified in other_group_var. Note that both other_group and other_group_var must be specified to use one of the functions.balance_cov()data - a data.frame, from which treatment and question are taken.treatment - a string character that corresponds to the name of the treatment variable. Note that values of said variable must be specified as treatment and control.question - a string character that corresponds to the name of the point in survey (question), for which balance test is required.factor - logical argument that specifies whether question is a factor. Default is factor = FALSE (i.e. question is a numeric or integer).factor_name - character that corresponds to specific factor (i.e. female), if question is a factor (i.e. sex).balance_attrite()data - a data.frame, from which treatment and question are taken.treatment - a string character that corresponds to the name of the treatment variable. Note that values of said variable must be specified as treatment and control.question - a string character that corresponds to the name of the point in survey (question), for which balance test is required.bounds()attrition package by Alex Coppock.data - a data.frame, from which treatment and DV are taken.treatment - a string character that corresponds to the name of the treatment variable. Note that values of said variable must be specified as treatment and control.DV - a string character that corresponds to the name of the outcome variable.type - character that corresponds to the type of bounds required ("Manski" or "Lee"). Default is type = "Manski".Let’s begin demonstrating the uses of attritevis, with a working example. We load test data from Lo, Renshon, and Bassan-Nygate 2021 (study 5B) which is an experimental survey study on whether peer-praise can encourage respondents to choose an empathy task.
The experiment manipulates peer-praise and measures empathy in a behavioral task. There are two arms in the peer-praise randomization: peer-praise and no praise (control). In the first arm, a word cloud of praise, drawn from real praise collected in a pilot study, is given for people who behave empathetically, with a line of text about peer group average thermometer ratings towards people who are empathetic – “Peers of yours on this platform have said they hold favorable feelings towards people who engage in empathetic behavior, with an average of 7.9, on a scale of 0 (least favorable) to 10 (most favorable), That same peer group provided real feedback for empathetic behavior which is pictured in the word cloud below”. The word cloud is presented in figure 1. Respondents in the control condition do not receive any additional information.
Figure 1: Word cloud of real praise presented to treated respondents.
Our outcome of interest is choosing to empathize with an image in a behavioral task. In the task, subjects choose between two “cards” a FEEL and a DESCRIBE task, that correspond to an empathy or objective task, in which they empathize/describe an image of a man. The cards are presented in figure 2. Below is a description of the survey, with information on the various variables collected.
Figure 2: Choice task FEEL and DESCRIBE cards.
After answering pre-treatment covariates, respondents in the study were asked to complete two practice rounds of the main empathy task. After completing the practice rounds, respondents complete three trials of the above mentioned tasks. Before each task, respondents are randomized into treatment and control groups. Treated respondents received the light-touch peer-praise treatment. During each trial, before respondents select between the FEEL and DESCRIBE tasks, happiness, the hypothesized mechanism, is measured. Treatment variables are labeled treat1, treat2, etc. Outcome variables, which are the choice-task card questions, are labeled card1, card2, etc. Mediators, which are measures of the emotion happiness, are labeled Happy_1_1, Happy_1_2… Happy_2_1, Happy_2_2… Happy_3_1, Happy_3_1, etc. After respondents complete all three trials post-task and post-treatment covariates are collected. Importantly, the dataframe test_data is organized based on the survey questions order. That is, if Q1 came before Q2 in the survey, the variable Q1 comes before the variable Q2 in the dataframe.
After loading the test data and ensuring that variables are ordered by survey questions, we may want to transform our dataframe to an attrition dataframe, using the function attrition.
attrition_data <- attrition(test_data)
This function creates a frame that indicates, per variable, how many respondents attrited, as well as the proportion of total N attrited which is calculated as number of attrited respondents / number of respondents entering into the question.
Using base R we can explore how many people attrited overall, and what proportion of the general population this is.
sum(attrition_data$attrited) #How many respondents attrited overall?
## [1] 129
sum(attrition_data$attrited)/nrow(test_data) #What proportion of the overall sample is this? (0.21)
## [1] 0.2067308
Next, we can look at specific variables, and learn whether respondents attrited. Let’s choose the variable cards_a to demonstrate. This is a variable that notes whether respondents clicked the “FEEL” or “DESCRIBE” button during their first practice round. Using base R we can extract the number of attrited respondents, as well as the proportion of total N attrited, for this question.
attrition_data[attrition_data$questions == 'cards_a', 'attrited']
## [1] 37
attrition_data[attrition_data$questions == 'cards_a', 'proportion']
## [1] 0.06
We learn that at the question cards_a 37 respondents attrited from the survey. This is equivalent to 6% of the number of respondents who entered the survey at this question. Is this a lot though? Where else do we see attrition in the study? To assess, we visualize attrition across the survey timeline.
We can further create that of this dataframe using the function attrition_table.
attrition_table(test_data)
| questions | attrited | proportion | proportion_q |
|---|---|---|---|
| consent | 0 | 0.00 | 0.00 |
| age | 3 | 0.00 | 0.00 |
| sex | 0 | 0.00 | 0.00 |
| education | 0 | 0.00 | 0.00 |
| state | 0 | 0.00 | 0.00 |
| income | 0 | 0.00 | 0.00 |
| part_id | 0 | 0.00 | 0.00 |
| race | 0 | 0.00 | 0.00 |
| religion | 0 | 0.00 | 0.00 |
| attrition_1 | 1 | 0.00 | 0.00 |
| attrition_2 | 6 | 0.01 | 0.01 |
| cards_a | 37 | 0.06 | 0.06 |
| pa | 0 | 0.00 | 0.00 |
| pb_1 | 0 | 0.00 | 0.00 |
| pb_2 | 0 | 0.00 | 0.00 |
| pb_3 | 0 | 0.00 | 0.00 |
| pc | 0 | 0.00 | 0.00 |
| cards_b | 0 | 0.00 | 0.00 |
| p2a | 0 | 0.00 | 0.00 |
| p2b_1 | 0 | 0.00 | 0.00 |
| p2b_2 | 0 | 0.00 | 0.00 |
| p2b_3 | 0 | 0.00 | 0.00 |
| p2c | 0 | 0.00 | 0.00 |
| treat1 | 0 | 0.00 | 0.00 |
| Happy_1_1 | 0 | 0.00 | 0.00 |
| Happy_1_2 | 0 | 0.00 | 0.00 |
| Happy_1_3 | 0 | 0.00 | 0.00 |
| cards1 | 0 | 0.00 | 0.00 |
| X1a | 0 | 0.00 | 0.00 |
| X1b_1 | 0 | 0.00 | 0.00 |
| X1b_2 | 0 | 0.00 | 0.00 |
| X1b_3 | 0 | 0.00 | 0.00 |
| X1c | 0 | 0.00 | 0.00 |
| treat2 | 0 | 0.00 | 0.00 |
| Happy_2_1 | 0 | 0.00 | 0.00 |
| Happy_2_2 | 0 | 0.00 | 0.00 |
| Happy_2_3 | 0 | 0.00 | 0.00 |
| cards2 | 0 | 0.00 | 0.00 |
| X2a | 0 | 0.00 | 0.00 |
| X2b_1 | 0 | 0.00 | 0.00 |
| X2b_2 | 0 | 0.00 | 0.00 |
| X2b_3 | 0 | 0.00 | 0.00 |
| X2c | 0 | 0.00 | 0.00 |
| treat3 | 0 | 0.00 | 0.00 |
| Happy_3_1 | 80 | 0.13 | 0.14 |
| Happy_3_2 | 0 | 0.00 | 0.00 |
| Happy_3_3 | 0 | 0.00 | 0.00 |
| cards3 | 0 | 0.00 | 0.00 |
| X3a | 0 | 0.00 | 0.00 |
| X3b_1 | 0 | 0.00 | 0.00 |
| X3b_2 | 0 | 0.00 | 0.00 |
| X3b_3 | 0 | 0.00 | 0.00 |
| post1 | 0 | 0.00 | 0.00 |
| post2_7 | 0 | 0.00 | 0.00 |
| post3 | 0 | 0.00 | 0.00 |
| post4 | 0 | 0.00 | 0.00 |
| post5 | 0 | 0.00 | 0.00 |
| post6 | 0 | 0.00 | 0.00 |
| post7 | 1 | 0.00 | 0.00 |
| post8 | 0 | 0.00 | 0.00 |
| post9 | 0 | 0.00 | 0.00 |
| post10 | 0 | 0.00 | 0.00 |
| post11_1 | 0 | 0.00 | 0.00 |
| post11_8 | 0 | 0.00 | 0.00 |
| post13_1 | 1 | 0.00 | 0.00 |
| post14_1 | 0 | 0.00 | 0.00 |
| post15_1 | 0 | 0.00 | 0.00 |
| post16_1 | 0 | 0.00 | 0.00 |
| post17 | 0 | 0.00 | 0.00 |
| ideology | 0 | 0.00 | 0.00 |
| trump_approval | 0 | 0.00 | 0.00 |
| pres_approval | 0 | 0.00 | 0.00 |
We can also use the arguments condition and treatment_var to examine the attrition table based on “treatment” or “control” conditions. The treatment_var is a character that corresponds to a specific variable, which is where the treatment was administered.
attrition_table(data= test_data,
condition = "treatment",
treatment_var = "treat1"
)
| questions | attrited | proportion | proportion_q |
|---|---|---|---|
| consent | 0 | 0.00 | 0.00 |
| age | 0 | 0.00 | 0.00 |
| sex | 0 | 0.00 | 0.00 |
| education | 0 | 0.00 | 0.00 |
| state | 0 | 0.00 | 0.00 |
| income | 0 | 0.00 | 0.00 |
| part_id | 0 | 0.00 | 0.00 |
| race | 0 | 0.00 | 0.00 |
| religion | 0 | 0.00 | 0.00 |
| attrition_1 | 0 | 0.00 | 0.00 |
| attrition_2 | 0 | 0.00 | 0.00 |
| cards_a | 0 | 0.00 | 0.00 |
| pa | 0 | 0.00 | 0.00 |
| pb_1 | 0 | 0.00 | 0.00 |
| pb_2 | 0 | 0.00 | 0.00 |
| pb_3 | 0 | 0.00 | 0.00 |
| pc | 0 | 0.00 | 0.00 |
| cards_b | 0 | 0.00 | 0.00 |
| p2a | 0 | 0.00 | 0.00 |
| p2b_1 | 0 | 0.00 | 0.00 |
| p2b_2 | 0 | 0.00 | 0.00 |
| p2b_3 | 0 | 0.00 | 0.00 |
| p2c | 0 | 0.00 | 0.00 |
| treatment_var | 0 | 0.00 | 0.00 |
| Happy_1_1 | 0 | 0.00 | 0.00 |
| Happy_1_2 | 0 | 0.00 | 0.00 |
| Happy_1_3 | 0 | 0.00 | 0.00 |
| cards1 | 0 | 0.00 | 0.00 |
| X1a | 0 | 0.00 | 0.00 |
| X1b_1 | 0 | 0.00 | 0.00 |
| X1b_2 | 0 | 0.00 | 0.00 |
| X1b_3 | 0 | 0.00 | 0.00 |
| X1c | 0 | 0.00 | 0.00 |
| treat2 | 0 | 0.00 | 0.00 |
| Happy_2_1 | 0 | 0.00 | 0.00 |
| Happy_2_2 | 0 | 0.00 | 0.00 |
| Happy_2_3 | 0 | 0.00 | 0.00 |
| cards2 | 0 | 0.00 | 0.00 |
| X2a | 0 | 0.00 | 0.00 |
| X2b_1 | 0 | 0.00 | 0.00 |
| X2b_2 | 0 | 0.00 | 0.00 |
| X2b_3 | 0 | 0.00 | 0.00 |
| X2c | 0 | 0.00 | 0.00 |
| treat3 | 0 | 0.00 | 0.00 |
| Happy_3_1 | 37 | 0.13 | 0.13 |
| Happy_3_2 | 0 | 0.00 | 0.00 |
| Happy_3_3 | 0 | 0.00 | 0.00 |
| cards3 | 0 | 0.00 | 0.00 |
| X3a | 0 | 0.00 | 0.00 |
| X3b_1 | 0 | 0.00 | 0.00 |
| X3b_2 | 0 | 0.00 | 0.00 |
| X3b_3 | 0 | 0.00 | 0.00 |
| post1 | 0 | 0.00 | 0.00 |
| post2_7 | 0 | 0.00 | 0.00 |
| post3 | 0 | 0.00 | 0.00 |
| post4 | 0 | 0.00 | 0.00 |
| post5 | 0 | 0.00 | 0.00 |
| post6 | 0 | 0.00 | 0.00 |
| post7 | 1 | 0.00 | 0.00 |
| post8 | 0 | 0.00 | 0.00 |
| post9 | 0 | 0.00 | 0.00 |
| post10 | 0 | 0.00 | 0.00 |
| post11_1 | 0 | 0.00 | 0.00 |
| post11_8 | 0 | 0.00 | 0.00 |
| post13_1 | 1 | 0.00 | 0.00 |
| post14_1 | 0 | 0.00 | 0.00 |
| post15_1 | 0 | 0.00 | 0.00 |
| post16_1 | 0 | 0.00 | 0.00 |
| post17 | 0 | 0.00 | 0.00 |
| ideology | 0 | 0.00 | 0.00 |
| trump_approval | 0 | 0.00 | 0.00 |
| pres_approval | 0 | 0.00 | 0.00 |
We may want to visualize attrition across the survey, to look at all the survey questions at once. The function plot_attrition allows us to plot attrition across survey questions, indicating pre-treatment, treatment, mediators, and outcome questions with different color vertical lines.
plot_attrition(data = test_data
,treatment = c("treat1", "treat2", "treat3")
,pre_treatment = c("consent", "age", "sex", "education", "state", "income", "part_id", "race", "religion", "attrition_1", "attrition_2", "cards_a", "pa", "pb_1", "pb_2", "pb_3", "pc", "cards_b", "p2a", "p2b_1", "p2b_2", "p2b_3", "p2c")
,DV = c("cards1", "cards2", "cards3")
,other_group = "Mediators"
,other_group_var = c("Happy_1_1", "Happy_1_2", "Happy_1_3",
"Happy_2_1", "Happy_2_2", "Happy_2_3",
"Happy_3_1", "Happy_3_2", "Happy_3_3")
,freq = TRUE
)
We can also specify freq = FALSE so that Y axis will indicate the proportion of total N attrited.
plot_attrition(data = test_data
,treatment = c("treat1", "treat2", "treat3")
,pre_treatment = c("consent", "age", "sex", "education", "state", "income", "part_id", "race", "religion", "attrition_1", "attrition_2", "cards_a", "pa", "pb_1", "pb_2", "pb_3", "pc", "cards_b", "p2a", "p2b_1", "p2b_2", "p2b_3", "p2c")
,DV = c("cards1", "cards2", "cards3")
,other_group = "Mediators"
,other_group_var = c("Happy_1_1", "Happy_1_2", "Happy_1_3",
"Happy_2_1", "Happy_2_2", "Happy_2_3",
"Happy_3_1", "Happy_3_2", "Happy_3_3")
,freq = FALSE
)
Once we have identified the specific survey points where attrition takes place, we want to conduct balance tests at these specific points to ensure balance across treatment and control, and learn if (and when) balance became an issue. We can do this using the functions balance_cov() and balance_attrite().
Once we’ve identified whether (and when) attrition occurs in our survey, we want to know that our treatment and control groups are balanced across covariates throughout the survey, to detect differential attrition. We can do this using the function balance_cov(), which we will demonstrate with three covariates: age, sex, and ideology.
We begin with the covariate age, which was collected pretreatment and is a numeric variable. In order to use the function balance_cov() we must define treatment and control arms under the treatment variables. We define treat1 as the treatment variable, and age as the question.
unique(test_data$treat1)
## [1] "treatment" "control" NA
balance_cov(data = test_data,
treatment = "treat1",
question = "age")
##
## Welch Two Sample t-test
##
## data: treat_data$question1 and control_data$question1
## t = -0.32688, df = 568.57, p-value = 0.7439
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -2.002600 1.431137
## sample estimates:
## mean of x mean of y
## 37.42361 37.70934
The output is a t-test that determines whether there is a difference between the average age of the control group and the treatment group. We learn that age is balanced across treatment and control groups, with a mean of approximately 37.4 years old in treated respondents and 37.7 in controled respondents (p=0.7).
We can also use the function balance_cov() when the covariate (question) is a factor, but we must specify which factor we are interested in. For example, let’s say we want to test whether at the question sex in the survey missingness created observable differences across treatment and control groups. Sex is a factor variable with two factors: female and male. We can look at whether the proportion of female still remains similar across groups. To do so, we must determine that factor = TRUE and specify the factor_name (in this case, female).
balance_cov(data = test_data,
treatment = "treat1",
question = "sex",
factor = TRUE,
factor_name = "female")
##
## 2-sample test for equality of proportions with continuity correction
##
## data: x out of n
## X-squared = 1.1305, df = 1, p-value = 0.2877
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## -0.12931498 0.03623038
## sample estimates:
## prop 1 prop 2
## 0.3576389 0.4041812
The output is a 2-sample proportion test. We learn that sex is also balanced between treatment and control, with similar proportions of females across the groups (p=0.3).
There are certain post-treatment variables for which we may want to ensure balance across treatment and control as well. Note, however, that these should be variables that we hypothesize would stay stable after treatment. For example, we occasionally include demographic questions at the end of the survey to avoid survey fatigue before treatments. In our running example, the ideology question was collected post-treatment, but we expect it to stay stable across treatment and control.
balance_cov(data = test_data,
treatment = "treat1",
question = "ideology")
##
## Welch Two Sample t-test
##
## data: treat_data$question1 and control_data$question1
## t = 1.023, df = 492.91, p-value = 0.3068
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1660012 0.5266633
## sample estimates:
## mean of x mean of y
## 3.879518 3.699187
Next, we can check whether our treatment is correlated with attrition at any moment in the survey. The balance_attrite() function converts the specified question into a binary variable such that attrition = 1, and remaining in survey = 0, and runs a lositic regression (regressing the specified question over the specified treatment) to examine whether treatment affects attrition.
Using our visualization, we identified that attrition occurs at the post-treatment question Happy_3_1. We can use the function balance_attrite(), to examine whether our treatment caused attrition at this point in the survey:
balance_attrite(data = test_data,
treatment = "treat1",
question = "Happy_3_1")
##
## Call:
## glm(formula = question1 ~ treatment1, family = binomial(link = "logit"),
## data = data2)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.5676 -0.5676 -0.5244 -0.5244 2.0259
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.7441 0.1653 -10.552 <2e-16 ***
## treatment1treatment -0.1704 0.2415 -0.706 0.48
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 464.49 on 576 degrees of freedom
## Residual deviance: 463.99 on 575 degrees of freedom
## (47 observations deleted due to missingness)
## AIC: 467.99
##
## Number of Fisher Scoring iterations: 4
We learn that our treat1 does not affect attrition in variable Happy_3_1.
As we demonstrated above, attrition doesn’t seem to pose a threat to inference in our dataset. But what does it look like when attrition is an issue? We simulate attrition on test_data to demonstrate what this would look like.
First, we simulate attrition such that the probability of of attrition in treated respondents is twice that of controlled respondents (0.3 vs. 0.15), in a new dataset called test_sim. We might see something like this if respondents are particularly taxed by a treatment in the survey and therefore more likely to drop out after receiving treatment.
We demonstrate what attrition looks like in this new dataset, using the plot_attrition() function
plot_attrition(data = test_sim
,treatment = c("treat1", "treat2", "treat3")
,pre_treatment = c("consent", "age", "sex", "education", "state", "income", "part_id", "race", "religion", "attrition_1", "attrition_2", "cards_a", "pa", "pb_1", "pb_2", "pb_3", "pc", "cards_b", "p2a", "p2b_1", "p2b_2", "p2b_3", "p2c")
,DV = c("cards1", "cards2", "cards3")
,other_group = "Mediators"
,other_group_var = c("Happy_1_1", "Happy_1_2", "Happy_1_3",
"Happy_2_1", "Happy_2_2", "Happy_2_3",
"Happy_3_1", "Happy_3_2", "Happy_3_3")
,freq = FALSE
)
We learn that most respondents attrite at the post-treatment question Happy_3_1, and conduct a balance test. Note that Happy_3_1 is a mediator, and we expect our treatment to affect it. It thus does not make sense to use the balance_cov() function. Instead, we want to examine whether our treatment caused attrition, and thus use the function balance_attrite():
balance_attrite(data = test_sim,
treatment = "treat1",
question = "Happy_3_1")
##
## Call:
## glm(formula = question1 ~ treatment1, family = binomial(link = "logit"),
## data = data2)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.7339 -0.7339 -0.5676 -0.5676 1.9520
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.7441 0.1653 -10.552 < 2e-16 ***
## treatment1treatment 0.5700 0.2158 2.641 0.00826 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 565.05 on 576 degrees of freedom
## Residual deviance: 557.92 on 575 degrees of freedom
## (47 observations deleted due to missingness)
## AIC: 561.92
##
## Number of Fisher Scoring iterations: 4
We learn that treated respondents are more likely to attrite, treatment is positively associated with attrition and is statistically significant.
Next, we use the function bounds() to to get extreme value (Manski) bounds. This function calls the function estimator_ev from the attrition package by Alex Coppock. treatment is the assignment indicator (Z). DV is the outcome of interest (Y). Our bounds() function removes respondents who attrited pre-treatment and calculates R (the respose indicator variable) based on missingness on the DV (missing=0, response=1), based on the assumptions drawn by Manski.
bounds(data = test_sim,
treatment = "treat1",
DV = "cards1")
## ci_lower ci_upper low_est upp_est low_var upp_var
## -0.013807860 0.133641578 0.059916859 0.059916859 0.001414914 0.001414914
The default for the bounds type is type = "Manski", but we can also specify the type of bounds such that type = "Lee" to get Trimming (Lee) bounds. Since we cannot defy the monotonicity assumption, Lee bounds cannot be yielded here, however we demonstrate the use of type = "Lee" in the next section.
Finally, to visualize manski bounds on the estimate of treat1 on cards1, we plug all relevant values into a new dataframe (plot_bounds) and plot the estimate with Manki bounds and the original confidence intervals of the model from the simulated data.
plot_bounds
| estimate | Type | low | high |
|---|---|---|---|
| 0.072 | Manski Bounds | -0.014 | 0.134 |
| 0.072 | Simulated Model | -0.011 | 0.155 |
ggplot(plot_bounds, aes(x = Type, y = estimate)) +
geom_errorbar(aes(ymin=low, ymax=high), width = 0.25,
size = 0.7, alpha = 1, color = "darkblue") +
geom_point(color = "red", size = 2, alpha = 0.6)+
labs(x = " ",
y = "Effect size of treat1 on cards1") +
theme(text = element_text(size = 10, family = "Times"),
panel.grid.major = element_blank(),
axis.text.x = element_text(size = 10),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
axis.line = element_line(colour = "black"))
Next, we simulate attrition such that the probability of of attrition in controlled respondents is twice that of treated respondents (0.3 vs. 0.15), in a new dataset called test_sim2. We might see something like this if positive emotions (like happiness) are ramped up with treatment, making attrition less likely.
We demonstrate what attrition looks like in this new dataset, using the plot_attrition() function
plot_attrition(data = test_sim2
,treatment = c("treat1", "treat2", "treat3")
,pre_treatment = c("consent", "age", "sex", "education", "state", "income", "part_id", "race", "religion", "attrition_1", "attrition_2", "cards_a", "pa", "pb_1", "pb_2", "pb_3", "pc", "cards_b", "p2a", "p2b_1", "p2b_2", "p2b_3", "p2c")
,DV = c("cards1", "cards2", "cards3")
,other_group = "Mediators"
,other_group_var = c("Happy_1_1", "Happy_1_2", "Happy_1_3",
"Happy_2_1", "Happy_2_2", "Happy_2_3",
"Happy_3_1", "Happy_3_2", "Happy_3_3")
,freq = FALSE
)
We see that the figure is quite similar to the test_sim dataset. However, when we run a balance test we learn that treatment and attrition are negatively associated and statistically significant:
balance_attrite(data = test_sim2,
treatment = "treat1",
question = "Happy_3_1")
##
## Call:
## glm(formula = question1 ~ treatment1, family = binomial(link = "logit"),
## data = data2)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.7844 -0.7844 -0.5244 -0.5244 2.0259
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.0211 0.1338 -7.633 2.30e-14 ***
## treatment1treatment -0.8934 0.2212 -4.040 5.35e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 569.87 on 574 degrees of freedom
## Residual deviance: 552.67 on 573 degrees of freedom
## (49 observations deleted due to missingness)
## AIC: 556.67
##
## Number of Fisher Scoring iterations: 4
Next, we use the function bounds() to to get extreme value (Manski) bounds.
bounds(data = test_sim2,
treatment = "treat1",
DV = "cards1")
## ci_lower ci_upper low_est upp_est low_var upp_var
## -0.002073252 0.146140229 0.072033488 0.072033488 0.001429615 0.001429615
We use the bounds() function to get Trimming (Lee) sharp bounds, which are smaller than Manski bounds.
bounds(data = test_sim2,
treatment = "treat1",
DV = "cards1",
type = "Lee")
## upper_bound lower_bound Out0_mono Out1L_mono Out1U_mono
## 0.08483438 0.04115226 0.30041152 0.34156379 0.38524590
## control_group_N treat_group_N Q f1 f0
## 243.00000000 254.00000000 0.03997366 0.11805556 0.15331010
## pi_r_1 pi_r_0
## 0.88194444 0.84668990
Finally, to visualize the different bounds on the estimate of treat1 on cards1, we plug all relevant values into a new dataframe (plot_bounds) and plot the estimate with Lee sharp bounds, Manki bounds, and the original confidence intervals of the model from the simulated data. Note that Lee sharp bounds are much smaller.
plot_bounds
| estimate | Type | low | high |
|---|---|---|---|
| 0.07 | Manski Bounds | -0.002 | 0.146 |
| 0.07 | Lee sharp Bounds | 0.085 | 0.041 |
| 0.07 | Simulated Model | -0.014 | 0.153 |
ggplot(plot_bounds, aes(x = Type, y = estimate)) +
geom_errorbar(aes(ymin=low, ymax=high), width = 0.25,
size = 0.7, alpha = 1, color = "darkblue") +
geom_point(color = "red", size = 2, alpha = 0.6)+
labs(x = " ",
y = "Effect size of treat1 on cards1") +
theme(text = element_text(size = 10, family = "Times"),
panel.grid.major = element_blank(),
axis.text.x = element_text(size = 10),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
axis.line = element_line(colour = "black"))